Discretization of Continuous-valued Attributes and Instance-based Learning
نویسندگان
چکیده
Recent work on discretization of continuous-valued attributes in learning decision trees has produced some positive results. This paper adopts the idea of discretization of continuous-valued attributes and applies it to instance-based learning (Aha, 1990; Aha, Kibler & Albert, 1991). Our experiments have shown that instance-based learning (IBL) usually performs well in continuous-valued attribute domains and poorly in nominal attribute domains. Cost and Salzberg (1993) have devised the modiied value-diierence metric (MVDM) that raises the performance of IBL in nominal attribute domains. This paper explores a way in which continuous-valued attributes and nominal attributes can be treated cohesively in IBL. An algorithm which combines the discretization of continuous-valued attributes and IB1 (Aha, Kibler & Albert, 1991) using the modiied value-diierence metric is introduced. The empirical results show that the proposed algorithm, IB1-MVDM* achieves a substantial improvement over C4.5 (Quinlan, 1993), IB1 and IB1-MVDM in most of the domains tested. A performance comparison is also made with a naive Bayesian learner (Cestnik, 1990).
منابع مشابه
Value Difference Metrics for Continuously Valued Attributes
Nearest neighbor and instance-based learning techniques typically handle continuous and linear input values well, but often do not handle symbolic input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between symbolic attribute values, but it largely ignores continuous attributes, using discretization to map continuous values into symb...
متن کاملMulti-Interval Discretization of Continuous-Valued Attributes for Classification Learning
Since most real-world applications of classification learning involve continuous-valued attributes, properly addressing the discretization process is an important problem. This paper addresses the use of the entropy minimization heuristic for discretizing the range of a continuous-valued attribute into multiple intervals. We briefly present theoretical evidence for the appropriateness of this h...
متن کاملAn Evolutionary Algorithm Integrating Discretization of Continuous-valued Attributes with Learning Decision Rules
A new method of learning decision rules from databases, which uses an evolutionary algorithm, is proposed. The main diierence between our approach and the others described in the literature is the way of processing of continuous-valued attributes. Most decision rule learners process separately these attributes when searching for threshold values, which may decrease the performance. In contrast ...
متن کاملDynamic Discretization of Continuous Attributes
Discretization of continuous attributes is an important task for certain types of machine learning algorithms. Bayesian approaches, for instance, require assumptions about data distributions. Decision Trees, on the other hand, require sorting operations to deal with continuous attributes , which largely increase learning times. This paper presents a new method of discretization, whose main char...
متن کاملImproved Heterogeneous Distance Functions
Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This pa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994